connection: propagate I/O errors to network filters. by PiotrSikora · Pull Request #13941 · envoyproxy/envoy

PiotrSikora · 2020-11-09T01:42:51Z

Signed-off-by: Piotr Sikora piotrsikora@google.com

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

PiotrSikora

This is my attempt at a fix for #13939. It's still work-in-progress, so errors are caught only in the raw_buffer_socket and there are no new tests, but I'd like to get feedback on the general direction.

cc @alyssawilk @antoniovicente @ggreenway @kyessenov

PiotrSikora · 2020-11-09T01:48:22Z

source/common/network/connection_impl.cc

+      stream_info_.setUpstreamTransportFailureReason(kUpstreamConnectionTerminationDetails);
+      stream_info_.setConnectionTerminationDetails(kUpstreamConnectionTerminationDetails);
+      stream_info_.setResponseFlag(StreamInfo::ResponseFlag::UpstreamConnectionTermination);
+    }


Those values are set on the upstream connection's stream info, and they are never propagated to downstream connection's stream info or access logs. I'm not sure what's the best way to solve that.

Could the high level request object query the connection as part of the termination process to determine if termination was graceful or abrupt?

ggreenway

I'm in favor of distinguishing between closing due to error, or graceful.

For the transport socket signaling, one idea is to have another PostIoAction. But I'm not entirely sure that will make the interface any cleaner.

Should this change also include a new ConnectionEvent to send to the network filters?

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

…rt_socket_errors

PiotrSikora · 2020-11-10T01:47:11Z

I'm in favor of distinguishing between closing due to error, or graceful.

For the transport socket signaling, one idea is to have another PostIoAction. But I'm not entirely sure that will make the interface any cleaner.

Something like the updated commit?

Should this change also include a new ConnectionEvent to send to the network filters?

The event is propagated to network filters via end_stream=true, and network filters can (well, once I figure out how to approach the downstream/upstream stream info issue - any thoughts?) learn about the error via streamInfo()->getResponseFlags().

Right now, there is no way to dispatch ConnectionEvent about upstream connection to network filters, and that's much bigger change than this (see: #13940), so I'd prefer not to mix them together to avoid unnecessary delays.

PiotrSikora · 2020-11-10T01:50:26Z

(Note: a few tests are failing because unmocked expectations for set response flags, etc., so please ignore them for now)

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

antoniovicente

Very interesting PR. I guess my feedback boils down to: Does it make sense to have both graceful and error close PostIoActions? A majority of the Close PostIoActions are clearly errors. The rest may also be errors.

antoniovicente · 2020-11-12T05:19:29Z

source/extensions/transport_sockets/alts/tsi_socket.cc


    end_stream_read_ = result.end_stream_read_;
-    read_error_ = result.action_ == Network::PostIoAction::Close;
+    read_error_ = result.action_ == Network::PostIoAction::CloseError;


Should this be:
result.action_ != Network::PostIoAction::KeepOpen;

or consider renaming read_error_ to got_close_error_

I guess one thing to keep in mind is that as of this PR RawBufferSocket can only return KeepOpen or CloseError from doRead. Should we consider all Close PostIoActions as errors?

antoniovicente · 2020-11-12T05:22:09Z

include/envoy/network/post_io_action.h

-  KeepOpen
+  KeepOpen,
+  // Close the connection because of an error.
+  CloseError,


It seems that a majority of Close cases are being renamed CloseError.

Should we also rename Close to CloseGraceful or similar as a way to improve our chances of catching all cases that need to be adjusted?

include/envoy/network/post_io_action.h

antoniovicente · 2020-11-12T05:32:44Z

source/common/network/connection_impl.cc

+      stream_info_.setUpstreamTransportFailureReason(kUpstreamConnectionTerminationDetails);
+      stream_info_.setConnectionTerminationDetails(kUpstreamConnectionTerminationDetails);
+      stream_info_.setResponseFlag(StreamInfo::ResponseFlag::UpstreamConnectionTermination);
+    }


Could the high level request object query the connection as part of the termination process to determine if termination was graceful or abrupt?

antoniovicente · 2020-11-12T05:33:42Z

source/common/network/connection_impl.cc

+    } else {
+      stream_info_.setUpstreamTransportFailureReason(kUpstreamConnectionTerminationDetails);
+      stream_info_.setConnectionTerminationDetails(kUpstreamConnectionTerminationDetails);
+      stream_info_.setResponseFlag(StreamInfo::ResponseFlag::UpstreamConnectionTermination);


nit: delegate stats update to virtual method so you can avoid the dynamic cast.

antoniovicente · 2020-11-12T05:36:39Z

source/common/network/connection_impl.cc

+      stream_info_.setResponseFlag(StreamInfo::ResponseFlag::UpstreamConnectionTermination);
+    }
+    // Force "end_stream" so that filters can process this error.
+    result.end_stream_read_ = true;


I usually associate end_stream_read_ with graceful termination. Is it appropriate to set this? Do we need alternate signaling to notify filters about abrupt terminations via either additional arguments to push pipeline or a new virtual method which is called on abrupt termination.

I agree; I think of end_stream as always a graceful operation. I think a different signal for error is more appropriate.

The issue is that there is no different signal right now, and if we don't set this, then network filters won't be executed at all.

Note that this PR is meant to be a temporary fix until we have a proper solution for #13940, but that's going to require a lot more changes and time.

This seems like a massive hack. How much effort would it be to implement the real fix?

That depends on what do you consider a real fix. I'm not too familiar with that part of the codebase, but if we want to propagate upstream connection events to network filters, then I imagine it's quite a lot of changes, so I think it would be better if one of maintainers could work on this.

@ggreenway

I don't know what to suggest. It would be good to figure out how to do a catch-all cleanup of WASM resources on abnormal connection termination. Could WASM detect connection termination based on an L4 filter that does the relevant signaling on destruction?

FWIW, we have a workaround in #13836 that works for Wasm.

antoniovicente · 2020-11-12T05:42:46Z

source/extensions/transport_sockets/tls/ssl_socket.cc

      info_->state() != Ssl::SocketState::ShutdownSent) {
    PostIoAction action = doHandshake();
-    if (action == PostIoAction::Close || info_->state() != Ssl::SocketState::HandshakeComplete) {
+    if (action != PostIoAction::KeepOpen || info_->state() != Ssl::SocketState::HandshakeComplete) {


Also change post actions in NotReadySslSocket to CloseError

antoniovicente · 2020-11-12T05:46:56Z

source/extensions/transport_sockets/alts/tsi_socket.cc

    ENVOY_CONN_LOG(debug, "TSI: Handshake failed: end of stream without enough data",
                   callbacks_->connection());
-    return Network::PostIoAction::Close;
+    return Network::PostIoAction::CloseError;


There are 2 remaining uses of Network::PostIoAction::Close; in this file when finding a read 0 during handshakes. Would it make sense to also consider those cases CloseError?

I changed one to CloseError and one to CloseGraceful (when the certificate validation failed, since that's not and an I/O error).

@ggreenway

I guess the question is wherever or it is worth having the extra complexity that comes from having 2 close types when out of the 3 remaining uses 2 of them are handshake errors. I guess it's fine to use CloseGraceful for the 3 remaining cases.

antoniovicente · 2020-11-12T05:52:17Z

source/extensions/transport_sockets/tls/ssl_handshaker.cc

    default:
      handshake_callbacks_->onFailure();
-      return PostIoAction::Close;
+      return PostIoAction::CloseError;


Does the PostIoAction::Close on line 233 above a graceful or error close?

…rt_socket_errors

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

antoniovicente · 2020-11-16T19:29:51Z

test/extensions/transport_sockets/tls/ssl_socket_test.cc

        server_connection->write(data, false);
        EXPECT_EQ(data.length(), 0);
-        // Calling close(FlushWrite) in onEvent() callback results in PostIoAction::Close,
+        // Calling close(FlushWrite) in onEvent() callback results in PostIoAction::CloseGraceful,


Seeing some failures in //test/extensions/transport_sockets/tls:ssl_socket_test, examples:

2020-11-14T01:05:04.6013231Z Uninteresting mock function call - taking default action specified at:
2020-11-14T01:05:04.6013980Z test/mocks/stream_info/mocks.cc:30:
2020-11-14T01:05:04.6014503Z Function call: setConnectionTerminationDetails("downstream connection was terminated")

2020-11-14T01:05:04.6024240Z unknown file: Failure
2020-11-14T01:05:04.6025476Z Uninteresting mock function call - taking default action specified at:
2020-11-14T01:05:04.6026043Z test/mocks/stream_info/mocks.cc:24:
2020-11-14T01:05:04.6026487Z Function call: setResponseFlag(16384)

Yeah, see my previous comment - I didn't add mocks for them, since it's a lot of extra noise in PR for code that was still being discussed. Also, I'm not sure if all 3 values (connection termination details, upstream transport socket failure and response flags) should be set.

antoniovicente · 2020-11-16T19:33:57Z

source/common/network/connection_impl.cc

+      stream_info_.setResponseFlag(StreamInfo::ResponseFlag::UpstreamConnectionTermination);
+    }
+    // Force "end_stream" so that filters can process this error.
+    result.end_stream_read_ = true;


@ggreenway

I don't know what to suggest. It would be good to figure out how to do a catch-all cleanup of WASM resources on abnormal connection termination. Could WASM detect connection termination based on an L4 filter that does the relevant signaling on destruction?

antoniovicente · 2020-11-16T19:35:44Z

source/extensions/transport_sockets/alts/tsi_socket.cc

    ENVOY_CONN_LOG(debug, "TSI: Handshake failed: end of stream without enough data",
                   callbacks_->connection());
-    return Network::PostIoAction::Close;
+    return Network::PostIoAction::CloseError;


@ggreenway

I guess the question is wherever or it is worth having the extra complexity that comes from having 2 close types when out of the 3 remaining uses 2 of them are handshake errors. I guess it's fine to use CloseGraceful for the 3 remaining cases.

github-actions · 2020-12-17T00:08:59Z

This pull request has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

github-actions · 2020-12-24T00:17:17Z

This pull request has been automatically closed because it has not had activity in the last 37 days. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

connection: propagate I/O errors to network filters.

232c0ff

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

PiotrSikora mentioned this pull request Nov 9, 2020

wasm: fix network leak #13836

Merged

PiotrSikora commented Nov 9, 2020

View reviewed changes

PiotrSikora mentioned this pull request Nov 9, 2020

Mid-stream I/O errors are silently ignored in TCP proxy and other network filters #13939

Closed

ggreenway reviewed Nov 9, 2020

View reviewed changes

antoniovicente self-assigned this Nov 9, 2020

PiotrSikora added 2 commits November 10, 2020 01:31

review: add PostIoAction::CloseError.

b62641f

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

Merge remote-tracking branch 'origin/master' into PiotrSikora/transpo…

8f58ce7

…rt_socket_errors

PiotrSikora requested review from antoniovicente and ggreenway November 10, 2020 01:58

review: fix format.

848af7d

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

PiotrSikora marked this pull request as ready for review November 10, 2020 02:48

PiotrSikora requested review from alyssawilk, asraa, htuch and lizan as code owners November 10, 2020 02:48

yanavlasov self-requested a review November 10, 2020 21:04

yanavlasov self-assigned this Nov 10, 2020

antoniovicente reviewed Nov 12, 2020

View reviewed changes

PiotrSikora added 3 commits November 12, 2020 10:40

Merge remote-tracking branch 'origin/master' into PiotrSikora/transpo…

ae3fc5b

…rt_socket_errors

Merge remote-tracking branch 'origin/master' into PiotrSikora/transpo…

4593733

…rt_socket_errors

Merge remote-tracking branch 'origin/master' into PiotrSikora/transpo…

8c5a5da

…rt_socket_errors

PiotrSikora mentioned this pull request Nov 13, 2020

Proxy-Wasm mega backport istio/envoy#284

Merged

PiotrSikora added 2 commits November 14, 2020 00:13

review: rename Close to CloseGraceful.

c46d41e

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

review: return CloseError from NotReadySslSocket.

6e04b67

Signed-off-by: Piotr Sikora <piotrsikora@google.com>

antoniovicente reviewed Nov 16, 2020

View reviewed changes

github-actions bot added the stale stalebot believes this issue/PR has not been touched recently label Dec 17, 2020

github-actions bot closed this Dec 24, 2020

Conversation

PiotrSikora commented Nov 9, 2020

Uh oh!

PiotrSikora left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggreenway left a comment

Choose a reason for hiding this comment

Uh oh!

PiotrSikora commented Nov 10, 2020

Uh oh!

PiotrSikora commented Nov 10, 2020

Uh oh!

antoniovicente left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PiotrSikora Nov 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 17, 2020

Uh oh!

github-actions bot commented Dec 24, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

PiotrSikora Nov 13, 2020 •

edited

Loading